Method and apparatus for decoding a coded digital audio signal which is arranged in frames containing headers

ABSTRACT

With audio data reduction on the basis of ISO/IEC standard 11172-3, a frame length varying by 8 bits is used at a sampling frequency of 44.1 kHz in order to arrive, on average, at a particular fixed data rate. The lengthening of a data frame is signalled by a padding bit in the header of the frames. The invention dispenses with evaluation of the padding bit. Instead, the mean frame length L is calculated, L is rounded down to the next integer, for the subsequent frame it is first established whether the expected sync word for this frame appears, and, if this is so, this frame is decoded without taking into account the padding bit, but if the expected sync word for this frame does not appear, the decoding of the frame is started one 8-bit later without taking into account the padding bit.

This application claims the benefit, under 35 U.S.C. § 365 ofInternational Application PCT/EP02/11388, filed Oct. 11, 2002, which waspublished in accordance with PCT Article 21(2) on May 1, 2003 in Englishand which claims the benefit of European patent application No.01250372.8, filed Oct. 23, 2001 and European patent application No.02090082.5 filed Mar. 1, 2002.

The invention relates to a method and an apparatus for decoding a codeddigital audio signal which is arranged in frames containing headers.

PRIOR ART

When using audio data reduction on the basis of ISO/IEC standards11172-3 and 13818-3, a frame length varying by 8 bits is used at asampling frequency of 44.1 kHz in order to arrive, on average, at aparticular fixed data rate (e.g. 128 000 bits/sec). The ‘lengthening’ ofa data frame is signalled by the “padding bit” in the header of a frame.This method is described more accurately in EP-A-0402973. The framesinitially also contain a sync word.

INVENTION

The evaluation of this padding bit in the decoder can causedifficulties. By way of example, in highly optimized decoders, thedigital signal processors (DSP) they contain require very sparing use ofstorage space. Since, however, the header in a frame is read at thestart of decoding of the frame, but the value of the padding bit is notneeded until right at the end of decoding of this frame, in a DSPimplementation an entire storage location (an integer value of, by wayof example, several bytes in length) is typically wasted on merelystoring the value of the padding bit.

It would be possible to achieve a reduction in the required storagespace by dispensing with the ‘padding’, i.e. the frame lengths wouldalways be kept constant even at a sampling frequency of 44.1 kHz.However, a particular fixed data rate of, by way of example, 128 000bits/sec is then no longer obtained, but rather a value which is 0.23%lower. A decoder which relies upon a constant frame length always beingused even at a sampling frequency of 44.1 kHz would no longer becompatible with the aforementioned ISO/IEC standard, however.

The invention is based on the object of specifying a method which allowsless storage space to be used but maintains the compatibility with theISO/IEC standards 11172-3 and 13818-3 or with similar standards. Thisobject is achieved by the method specified in Claim 1. A decoder usingthis method is specified in Claim 5.

In accordance with the invention, the data frames of varying length areevaluated on the basis of the respective length, but evaluation of thepadding bit from the header is avoided. Since the value of the paddingbit is normally used to ascertain the exact position of the start of thenext frame, the invention involves ascertaining the start of the nextframe in another way, namely by calculating a mean frame length and arounding-down or rounding-up of this mean frame length to the closestinteger byte values for the received frames.

The advantage is that the value of the padding bit does not need to bestored for the entire time taken for decoding a frame, and hence storagespace can be saved more frugally.

In principle, the inventive method relates to the decoding of a codeddigital audio signal which is arranged in frames containing headers,where the header in a frame contains a respective information itemregarding whether this frame has a standard length or a length whichdiffers therefrom for some of the frames, and where the frames contain arespective sync word, having the following steps:

-   -   the length-variation information regarding the respective frame        length is not stored or evaluated;    -   the approximate start of the next frame is determined using the        following formula:        L=N*R/fs/SL,    -   where L is equal to the length of the frames, N is equal to the        number of samples per frame, R is equal to the total data rate,        fs is equal to the sampling frequency, SL is equal to the        stipulated subunit for indicating the frame length;    -   L is rounded down to the next integer of subunits SL;    -   for the subsequent frame, it is first established whether the        expected sync word for this frame appears;    -   if the expected sync word for this frame appears, this        subsequent frame is decoded without taking into account the        length-variation information;    -   if the expected sync word for this frame does not appear, the        decoding of this subsequent frame is started one subunits later        without taking into account the length-variation information.

In principle, the inventive apparatus relates to a decoder for decodinga coded digital audio signal which is arranged in frames containingheaders, where the header in a frame contains a respective informationitem regarding whether this frame has a standard length or a lengthwhich differs therefrom for some of the frames, and the frames contain arespective sync word, where, for ascertaining the frame length, thelength-variation information regarding the respective frame length isnot stored or evaluated, and where the apparatus contains:

-   -   means for decoding the audio signal;    -   a frame-start estimator in which the approximate start of the        next frame is determined using the following formula:        L=N*R/fs/SL,    -   where L is equal to the length of the frames, N is equal to the        number of samples per frame, R is equal to the total data rate,        fs is equal to the sampling frequency, SL is equal to the        stipulated subunit, and in which L is rounded down to the next        integer of subunits SL;    -   a sync-word checker which, for the subsequent frame, first        establishes whether the expected sync word for this frame        appears, where, if the expected sync word for this frame        appears, this subsequent frame is decoded in the decoding means        without taking into account the length-variation information,        and, if the expected sync word for this frame does not appear,        the decoding of this subsequent frame is started in the decoding        means one subunit SL later without taking into account the        length-variation information.

Instead of evaluating the sync word, another known and expected datapattern can also be evaluated.

DRAWINGS

Exemplary embodiments of the invention are described with reference tothe drawings, in which:

FIG. 1 shows two successive data frames having the same length;

FIG. 2 shows two successive data frames having different lengths;

FIG. 3 shows a decoder in accordance with the invention.

EXEMPLARY EMBODIMENTS

In data-reducing coding and decoding methods for audio signals, such asin ISO/IEC 11172-3 (MPEG audio), the coded audio signals are stored ortransmitted in data frames which respectively contain a fixed number Nof audio samples, e.g. 1152 samples. The data frames have, in principle,a fixed length which is a multiple of a basic unit, which is called a‘slot’ in ISO/IEC 11172-3 and has a length of 8 bits in the ‘layer 2’and ‘layer 3’ variants.

In FIG. 1, each of the successive frames of the same length of L byteshas a header Hd which contains a sync word SY. The size of the subunitSL is 1 byte=8 bits in this example.

If audio signals having sampling frequencies fs of 32 000 Hz or 48 000Hz are used, then the relationship between the total data rate R (inbits/sec) and the frame length L (in slots) is as follows:L=N*R/fs/8  (1)Example:N=1152 samples; R=128 000 bits/sec; fs=48 000 Hz gives L=384 slots of 8bits each.

If, however, a sampling frequency of 44 100 Hz is used, then non-integervalues for L are produced in (1). In this way, the start of the nextframe is determined only approximately. Example:

N=1152 samples; R=128 000 bits/sec; fs=44 100 Hz gives L=417.9591837slots of 8 bits each.

However, since a frame can only have an integer number of slots, a framelength which varies by 1 slot (=8 bits) is used at a sampling frequencyof 44.1 kHz in order to arrive, on average, at a particular fixed datarate (e.g. R=128 000 bits/sec) and is signalled, as described above,using the padding bit in the header. When the result from formula (1) isrounded down, the correct frame start is often obtained for a samplingfrequency of 44.1 kHz, namely for those frames which have not beenlengthened by 1 slot. Often, however, an incorrect value is alsoobtained for the frame start. If the next frame starts to be decoded atthis incorrect point, then an error is obtained, since the sync word tobe expected at the start of the frame obviously does not appear.

Normally, decoders then switch to an error recovery mode and start afresh complex search for a sync word. This typically produces a fault inthe decoded output signal.

In FIG. 2, the first frame is one unit SL longer than the second frame,i.e. L+1 bytes. If decoding starts at the place pointed at by thepointer LPOI for the calculated and rounded-down variable L, then nosync word is found at that place. For this reason, a check is carriedout one unit SL further on to determine whether a sync word is present,and this sync word is found at that place.

The invention therefore proposes, when decoding encoded signals havingthe sampling frequency 44 100 Hz or 22 050 Hz:

-   -   not storing or evaluating the padding bit;    -   determining the approximate start of the next frame using the        formula (1);    -   rounding down the result from (1) to the next integer;    -   for the subsequent frame, first establishing whether the        expected sync word or another known data pattern appears;    -   if this is the case, decoding this subsequent frame without        taking into account the padding bit;    -   if this is not the case, starting the decoding of this        subsequent frame one slot later without taking into account the        padding bit.

FIG. 3 shows an inventive decoder which receives a coded audio signalEAS which is supplied to a bit stream deformatter BSD. BSD interchangescorresponding data with a frame-start estimator FSE. The frame startaddress estimated therein or a corresponding pointer LPOI is used toestablish, in a sync word checker SYCH, whether there is a sync word atthe appropriate point in the data stream. If this is so, a decoder stageDEC and/or the bit stream deformatter BSD receives the information whichprompts the further processing or decoding of the next data frame tostart at that point. The audio signal decoded in the frequency domain issupplied by the decoder stage DEC to a windowing stage DW whichmultiplies portions of the audio signal using a synthesis filter, forexample, converts them to the time domain and outputs a decoded audiosignal DAS.

The invention can also be used for related applications in which anon-integer result from (1) causes a variation in the frame length andsaid variation is indicated using an information item similar to a‘padding bit’.

1. Method for decoding a coded digital audio signal which is arranged inframes containing headers, where the header in a frame contains arespective information item regarding whether this frame has a standardlength or a length which differs therefrom for some of the frames, andwhere the frames contain a respective sync word, comprising thefollowing steps: determining the approximate start of the next frameusing the following formula:L=N*R/fs/SL, where L is equal to the length of the frames, N is equal tothe number of samples per frame, R is equal to the total data rate, fsis equal to the sampling frequency, SL is equal to the stipulatedsubunit for indicating the frame length: rounding L down to the nextinteger of subunits SL: establishing, for the subsequent frame, whetherthe expected sync word for this frame appears; if the expected sync wordfor this frame appears, decoding this subsequent frame without takinginto account the length-variation information; if the expected sync wordfor this frame does not appear, starting the decoding of this subsequentframe one subunit later without taking into account the length-variationinformation, wherein the steps are performed without storing orevaluating the length-variation information regarding the respectiveframe length.
 2. Method according to claim 1, where the parameters forcalculating the formula for the approximate frame start comprise knownparameters in a transmission system.
 3. Method according to claim 2,where at least one of the parameters is transmitted in the header offrames.
 4. Method according to claim 1, where, instead of establishingwhether an expected sync word appears, establishing whether anotherknown pattern appears in the next frame.
 5. Apparatus for decoding acoded digital audio signal which is arranged in frames containingheaders, where the header in a frame contains a respective informationitem regarding whether this frame has a standard length or a lengthwhich differs therefrom for some of the frames, and the frames contain arespective sync word, where, for ascertaining the frame length, thelength-variation information regarding the respective frame length isnot stored or evaluated, and where the apparatus comprises: means fordecoding the audio signal; a frame-start estimator in which theapproximate start of the next frame is determined using the followingformula:L=N*R/fs/SL, where L is equal to the length of the frames, N is equal tothe number of samples per frame, R is equal to the total data rate, fsis equal to the sampling frequency, SL is equal to the stipulatedsubunit, and in which L is rounded down to the next integer of subunitsSL; a sync-word checker which, for the subsequent frame, firstestablishes whether the expected sync word for this frame appears,where, if the expected sync word for this frame appears, this subsequentframe is decoded in the decoding means without taking into account thelength-variation information, and, if the expected sync word for thisframe does not appear, the decoding of this subsequent frame is startedin the decoding means one subunit SL later without taking into accountthe length-variation information.